Introduction to the CoNLL-2000 Shared Task Chunking
نویسندگان
چکیده
We describe the CoNLL-2000 shared task: dividing text into syntactically related nonoverlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.
منابع مشابه
Improving Chunking by Means of Lexical-Contextual Information in Statistical Language Models
In this work, we present a stochastic approach to shallow parsing. Most of the current approaches to shallow parsing have a common characteristic: they take the sequence of lexical tags proposed by a POS tagger as input for the chunking process. Our system produces tagging and chunking in a single process using an Integrated Language Model (ILM) formalized as Markov Models. This model integrate...
متن کاملHybrid Text Chunking
This paper describes a HMM-based chunk tagger and its extensions used in KRDL for the shared task of CoNLL'2000. Compared with standard HMM-based tagger, this tagger incorporates more contextual information into a lexical entry. Moreover, an error-driven learning approach is adopted to decrease the memory requirement. It keeps only positive lexical entries which contribute to the error reductio...
متن کاملRule-Based Chunking and Reusability
In this paper we discuss a rule-based approach to chunking implemented using the LT-XML2 and LT-TTT2 tools. We describe the tools and the pipeline and grammars that have been developed for the task of chunking. We show that our rule-based approach is easy to adapt to different chunking styles and that the mark-up of further linguistic information such as nominal and verbal heads can be added to...
متن کاملA Robust Risk Minimization based Named Entity Recognition System
This paper describes a robust linear classification system for Named Entity Recognition. A similar system has been applied to the CoNLL text chunking shared task with state of the art performance. By using different linguistic features, we can easily adapt this system to other token-based linguistic tagging problems. The main focus of the current paper is to investigate the impact of various lo...
متن کامل